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Abstract — We present a novel statistically-based discretization 
paradigm and derive a class of maximum a posteriori (MAP) 
estimators for solving ill-conditioned linear inverse problems. 
We are guided by the theory of sparse stochastic processes, 
which specifies continuous-domain signals as solutions of linear 
stochastic differential equations. Accordingly, we show that the 
class of admissible priors for the discretized version of the signal 
is confined to the family of infinitely divisible distributions. 
Our estimators not only cover the well-studied methods of 
Tikhonov and ^i-type regularizations as particular cases, but 
also open the door to a broader class of sparsity-promoting 
regularization schemes that are typically nonconvex. We provide 
an algorithm that handles the corresponding nonconvex problems 
and illustrate the use of our formalism by applying it to decon- 
volution, MRI, and X-ray tomographic reconstruction problems. 
Finally, we compare the performance of estimators associated 
with models of increasing sparsity. 

Index Terms — Innovation models, MAP estimation, non- 
Gaussian statistics, sparse stochastic processes, sparsity- 
promoting regularization, nonconvex optimization. 

I. Introduction 

WE consider linear inverse problems that occur in a 
variety of biomedical imaging applications (Tl-B). 
In this class of problems, the measurements y are obtained 
through the forward model 

y = Hs + n, (1) 

where s represents the true signal/image. The linear operator 
H models the physical response of the acquisition/imaging 
device and n is some additive noise. A conventional approach 
for reconstructing s is to formulate the reconstructed signal s* 
as the solution of the optimization problem 

s* = axg min X>(s; y) + XTl(s), (2) 

s 

where 2?(s;y) quantifies the distance separating the recon- 
struction from the observed measurements, lZ(s) measures the 
regularity of the reconstruction, and A > is the regularization 
parameter. 

In the classical quadratic (Tikhonov-type) reconstruction 
schemes, one utilizes £ 2 -norms for measuring both the data 
consistency and the reconstruction regularity p). In a general 
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setting, this leads to a smooth optimization problem of the 
form 

s* = argmin||y - Hs||| + A||Rs||l, (3) 

S 

where R is a linear operator and the formal solution is given 
by 

s* = (H T H + AR T R) 1 H T y. (4) 

The linear reconstruction framework expressed in (f3])-(|4| can 
also be derived from a statistical perspective. Under the 
hypothesis that s follows a multivariate zero-mean Gaussian 
distribution with covariance matrix C ss = E{ss T }, the opera- 

— 1/2 

tor C ss whitens s (i.e., renders its components independent). 
Moreover, if n is additive white Gaussian noise (AWGN) of 
variance a 2 , the maximum a posteriori (MAP) formulation of 
the reconstruction problem yields 

smap = (H T H + ^C" 1 )-^^, (5) 

which is equal to Q when CsV^ 2 = R- and a 2 = A. In the 
Gaussian scenario, the MAP estimator is known to yield the 
minimum mean square error (MMSE) solution. The equivalent 
Wiener solution |5]) is also applicable for non-Gaussian models 
with known covariance C ss and is commonly referred to as 
the linear minimum mean square error (LMMSE) Q. 

In recent years, the paradigm in variational formulations 
for signal reconstruction has shifted from the classical linear 
schemes to the sparsity-promoting methods motivated by the 
observation that many signals that occur naturally have sparse 
or nearly-sparse representations in some transform domain |5|. 
The promotion of sparsity is achieved by specifying well- 
chosen non-quadratic regularization functionals and results in 
nonlinear reconstruction. One common choice for the regu- 
larization functional is lZ(v) = ||v||i, where v = W _1 s 
represents the coefficients of a wavelet (or a wavelet-like 
multiscale) transform |6). An alternative choice is IZ(s) = 
||Ls||i, where L is the discrete version of the gradient or 
Laplacian operator, with the gradient one being known as total- 
variation (TV) regularization [7|. Although using the l\ norm 
as regularization functional has been around for some time 
(for instance, see |8), |9j), it is currently at the heart of sparse 
signal reconstruction problems. Consequently, a significant 
amount of research is dedicated to the design of efficient 
algorithms for nonlinear reconstruction methods [10]. 

The current formulations of sparsity-promoting regulariza- 
tion are based on solid variational principles and are predomi- 
nantly deterministic. They can also be interpreted in statistical 
terms as MAP estimators by considering generalized Gaussian 
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or Laplace priors (TT)-|[T3j. These models, however, are 
tightly linked to the choice of a given sparsifying transform, 
with the downside that they do not provide further insights on 
the true nature of the signal. 
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A. Contributions 

In this paper, we revisit the signal reconstruction problem 
by specifying upfront a continuous-domain model for the 
signal that is independent from the subsequent reconstruc- 
tion task/algorithm and apply a proper discretization scheme 
to derive the corresponding MAP estimator. Our approach 
builds upon the theory of continuous-domain sparse stochastic 
processes fl4) . In this framework, the stochastic process is 
defined through an innovation model that can be driven by 
a non-Gaussian excitation Q The primary advantage of our 
continuous-domain formulation is that it lends itself to an 
analytical treatment. In particular, it allows for the derivation 
of the probability density function (pdf) of the signal in any 
transform domain, which is typically much more difficult in a 
purely discrete framework. Remarkably, the underlying class 
of models also provides us with a strict derivation of the class 
of admissible regularization functionals which happen to be 
confined to two categories: Gaussian or sparse. 

The main contributions of the present work are as follows: 

■ The introduction of continuous-domain stochastic models 
in the formulation of inverse problems. This leads to the 
use of non-quadratic reconstruction schemes. 

■ A general scheme for the proper discretization of these 
problems. This scheme specifies feasible statistical esti- 
mators. 

■ The characterization of the complete class of admissi- 
ble potential functions (prior log-likelihoods) and the 
derivation of the corresponding MAP estimators. The 
connections between these estimators and the existing 
deterministic methods such as TV and £i regularizations 
are also explained. 

■ A general reconstruction algorithm, based on variable- 
splitting techniques, that handles different estimators, 
including the nonconvex ones. The algorithm is applied 
to deconvolution and to the reconstruction of MR and 
X-ray images. 

B. Outline 

The paper is organized as follows: In Section [II] we explain 
the acquisition model and obtain the corresponding represen- 
tation of the signal s and the system matrix H. In Section III 
we introduce the continuous-domain innovation model that 
defines a generalized stochastic process. We then statistically 
specify the discrete-domain counterpart of the innovation 
model and characterize admissible prior distributions. Based 
on this characterization, we derive the MAP estimation as an 
optimization problem in Section IV In Section [VJ we provide 
an efficient algorithm to solve the optimization problem for 
a variety of admissible priors. Finally, in Section VI 



we 

illustrate our discretization procedure by applying it to a series 

'it is noteworthy that the theory includes the stationary Gaussian processes. 



Fig. 1. General form of the linear, continuous-domain measurement model 
considered in this paper. The signal s(x) is acquired through linear measure- 
ments of the form z m = [Wa] = {s,ip m ). The resulting vector z S R 
is corrupted with AWGN. Our goal is to estimate the original signal s from 
noisy measurements y by exploiting the knowledge that s is a realization of 
a sparse stochastic process that satisfies the innovation model hs = w, where 
to is a non-Gaussian white innovation process. 



of deconvolution and of MR and X-ray image-reconstruction 
problems. This allows us to compare the effect of different 
sparsity priors on the solution. 

C. Notations 

Throughout the paper, we assume that the measurement 
noise is AWGN of variance a 2 . The input argument x G K d 
of the continuous-domain signals is written inside parenthesis 
(e.g., s(x)) whereas, for discrete-domain signals, we employ 
k € Z d and use brackets (e.g., s[k}). The scalar product is 
represented by (■, •) and #(•) denotes the Dirac impulse. 

II. Measurement Model 

In this section, we develop a discretization scheme that 
allows us to obtain a tractable representation of continuously- 
defined measurement problem, with minimal loss of informa- 
tion. Such discrete representation is crucial since the resulting 
reconstruction algorithms are implemented numerically. 

A. Discretization of the Signal 

To obtain a clean analytical discretization of the problem, 
we consider the generalized sampling approach using "shift- 
invariant" reconstruction spaces fl5) . The advantage of such 
a representation is that it offers the same type of error control 
as finite-element methods (i.e., one can make the discretiza- 
tion error arbitrarily small by making the reconstruction grid 
sufficiently fine). 

The idea is to represent the signal s by projecting it onto 
a reconstruction space. We define our reconstruction space at 
resolution T as 



s T (x) = ^ s[k] fint f~ - ft) : s[k] g 4o(Z 



(6) 



where s[k] — s(x)\ x= Tk, and Lp- lnt is an interpolating basis 
function positioned on the reconstruction grid TZ d . The 
interpolation property is ip- m t(k) = 5[k]. For the representation 
of s in terms of its samples s[k] to be stable and unambiguous, 
</?i n t has to be a valid Riesz basis for Vr(y>int)- Moreover, to 
guarantee that the approximation error decays as a function 
of T, the basis function should satisfy the partition of unity 
property f!3) 



fcez d 



tp- m t(x - k) = 1, yx e 



(7) 
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The projection of the signal onto the reconstruction space 
Vr(Vint) is given by 

P Vt s(x) = s(Tk)cp int (| - fc), (8) 



with the property that Pv T Pv T s(x) — Pv T s(x) (since Py T is 
a projection operator). To simplify the notation, we shall use 
a unit sampling T = 1 with the implicit assumption that the 
sampling error is negligible. (If the sampling error is large, 
one can use a finer sampling and rescale the reconstruction 
grid appropriately.) Thus, the resulting discretization is 



s 1 (x) = P Vl s{x) = s[k](f int (x - k). 



(9) 



To summarize, si(x) is the discretized version of the original 
signal s(x) and it is uniquely described by the samples 
s[k] = s(x)\ x= k for k £ Z d . The main point is that the 
reconstructed signal is represented in terms of samples even 
though the problem is still formulated in the continuous- 
domain. 

B. Discrete Measurement Model 

By using the discretization scheme in we are now ready 
to formally link the continuous model in Figure [T] and the 
corresponding discrete linear-inverse problem. Although the 
signal representation (|9]l is an infinite sum, in practice we 
restrict ourselves to a subset of TV basis functions with k £ fl, 
where 51 is a discrete set of integer coordinates in a region of 
interest (ROI). Hence, we rewrite |9) as 



si{x) 



ken 



s[k]ip k (x), 



(10) 



where (fik{x) corresponds to Lp- ln t(x — k) up to modifications 
at the boundaries (periodization or Neumann boundary condi- 
tion). 

We first consider a noise-free signal acquisition. The general 
form of a linear, continuous-domain noise-free measurement 
system is 



s(x)^ m (x)dx, (to = 1, ... , M) 



(11) 



where s(x) is the original signal, and the measurement 
function tj) m {x) represents the spatial response of the mth 
detector which is application dependent as we shall explain in 
Section [Vl] 

By substituting the signal representation |9) into (jTTJ, we 
discretize the measurement model and write it in matrix-vector 
form as 

y = z + n = Hs + n, (12) 

where y is the A/-dimensional measurement vector, s = 
{s[k]) ken is the iV-dimensional signal vector, n is the M- 
dimensional noise vector, and H is the M x N system matrix 
whose entry (m, k) is given by 

[ H ]m k = (V'm, <Pk) = / ^ m (x)(pk(x)dx. (13) 

This allows us to specify the discrete linear forward model 
given in ([TJ which is compatible with the continuous-domain 



formulation. The solution of this problem yields the represen- 
tation si(x) of s(x) which is parameterized in terms of the 
signal samples s. Having the forward model explained, our 
next aim is to obtain the statistical distribution of s. 



III. Sparse Stochastic Models 

We now proceed by introducing our stochastic framework 
which will provide us with a signal prior. For that purpose, we 
assume that s(x) is a realization of a stochastic process that 
is defined as the solution of a linear stochastic differential 
equation (SDE) with a driving term that is not necessarily 
Gaussian. Starting from such a continuous-domain model, we 
aim at obtaining the statistical distribution of the sampled 
version of the process (discrete signal) that will be needed 
to formulate estimators for the reconstruction problem. 

A. Continuous-Domain Innovation Model 

As mentioned in Section [TJ we specify our relevant class of 
signals as the solution of an SDE in which the process s is 
assumed to be whitened by a linear operator. This model takes 
the form 

Ls = w, (14) 

where w is a continuous-domain white innovation process (the 
driving term), and L is a (multidimensional) differential oper- 
ator. The right-hand side of ( p"4| ) represents the unpredictable 
part of the process, while L is called the whitening operator. 
Such models are standard in the classical theory of stationary 
Gaussian processes p6[ . The twist here is that the driving 
term w is not necessarily Gaussian. Moreover, the underlying 
differential system is potentially unstable to allow for self- 
similar models. 

In the present model, the process s is characterized by the 
formal solution s = L~ 1 w, where L _1 is an appropriate right 
inverse of L. The operator L _1 amounts to some generalized 
"integration" of the innovation w. The implication is that the 
correlation structure of the stochastic process s is determined 
by the shaping operator L _1 , whereas its statistical properties 
and sparsity structure is determined by the driving term w. 
As an example in the one-dimensional setting, the operator L 
can be chosen as the first-order continuous -domain derivative 
operator L = D. For multidimensional signals, an attractive 
class of operators is the fractional Laplacian (— A) = which is 
invariant to translation, dilation, and rotation in R d [17|. This 
operator gives rise to "l/||u;|| 7 "-type power spectrum and is 



frequently used to model certain types of images fTS) , 1 19 1. 

The mathematical difficulty is that the innovation w cannot 
be interpreted as an ordinary function because it is highly 
singular. The proper framework for handling such singular ob- 
jects is Gelfand and Vilenkin's theory of generalized stochastic 
processes |20|. In this framework, the stochastic process s is 
observed by means of scalar-products (s, ip) with ip £ S(M. d ), 
where 6>(R d ) denotes the Schwartz class of rapidly decreasing 
test functions. Intuitively, this is analogous to measuring an 
intensity value at a pixel after integration through a CCD 
detector. 
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A fundamental aspect of the theory is that the driving term 
w of the innovation model ( fBI is uniquely specified in terms 
of its Levy exponent /(•). 

Definition 1. A complex-valued function f : K — > C is a valid 
Levy exponent iff. it satisfies the three following conditions: 

1) it is continuous; 

2) it vanishes at the origin; 

3) it is conditionally positive-definite of order one in the 
sense that 

N N 



m— 1 n— 1 



under the condition Y] _-, £ m = for every possible 
choice of uj\, . . . ,un € K» £ij • • • ,£n € C, and N € 
N\{0}. 

An important subset of Levy exponents are the p-admissible 
ones, which are central to our formulation. 

Definition 2. A Levy exponent f with derivative /' is called 
p-admissible if it satisfies the inequality 

+ MI/( W )| < cm* 

for some constant C > and < p < 2. 

A typical example of a p-admissible Levy exponent is 
f(ui) = — so\uj\ a with so > 0. The simplest case is 
/Gauss (w) = — ||w| 2 ; it will be used to specify Gaussian 
processes. 

Gelfand and Vilenkin have characterized the whole class of 
continuous-domain white innovation and have shown that they 
are fully specified by the generic characteristic form 



f(<p(x))dx ), 



exp 



(15) 



where / is the corresponding Levy exponent of the innovation 
process w. The powerful aspect of this characterization is that 
8P W is indexed by a test function ip E S rather than by a 
scalar (or vector) Fourier variable ui. As such, it constitutes 
the infinite-dimensional generalization of the characteristic 
function of a conventional random variable. 

Recently, Unser et al. characterized the class of stochastic 
processes that are solutions of ([14} where L is a linear shift- 
invariant (LSI) operator and w is a member of the class of 



so-called Levy noises 1 14 Theorem 3]. 



Theorem 1. Let w be a Levy noise as specified by ( |15| l and 

L _1 * be a left inverse of the adjoint operator L* such that 
either one of the conditions below is met: 

1) L^ 1 * is a continuous linear map from S(K ) into itself; 

2) / is p-admissible and L~ x * is a continuous linear map 
from S(R d ) into L p (R d ); that is, 

\\L- u tp\\ Lv <C|M| V y^eS(R d ) 

for some constant C and some p > 1. 



Then, s = L~ 1 w is a well-defined generalized stochastic 
process over the space of tempered distributions S' (R d ) and 
is uniquely characterized by its characteristic form 

& s {<p) =E{e i<s *' > } =exp ^ f(L- u ip(x))dx\ (16) 

It is a (weak) solution of the stochastic differential equation 
Ls = w in the sense that (Ls, ip) — (w, ip) for all (/? € ^(IR^). 

Before we move on, it is important to emphasize that Levy 
exponents are in one-to-one correspondence with the so-called 
infinitely divisible (i.d.) distributions [21]. 



Definition 3. A generic pdf px is infinitely divisible if, for 
any positive integer n, it can be represented as the n-fold 
convolution (p * ■ ■ ■ * p) where p is a valid pdf. 

Theorem 2 (Levy-Schoenberg). Let px{u) = E{c j " x } = 
J R e^ x px{x) dx be the characteristic function of an infinitely 
divisible random variable X. Then, 

f(oj) = logpx(w) 

is a Levy exponent in the sense of Definition [7] Conversely, if 
f(uj) is a valid Levy exponent, then the inverse Fourier integral 

Px{x) 



2ir 

yields the pdf of an i.d. random variable. 

Another important theoretical result is that it is possible 
to specify the complete family of i.d. distributions thanks 
to the celebrated Levy-Khintchine representation (22) which 
provides a constructive method for defining Levy exponents. 
This tight connection will be essential for our formulation and 
limits us to a certain family of prior distributions. 

B. Statistical Distribution of Discrete Signal Model 

The interest is now to statistically characterize the dis- 



cretized signal described in Section II-A To that end, the 



first step is to formulate a discrete version of the continuous- 
domain innovation model ( |14) , Since, in practical applications, 
we are only given the samples s[fc]fe £ si of the signal, we 
obtain the discrete-domain innovation model by applying to 
them the discrete counterpart Ld of the whitening operator L. 
The fundamental requirement for our formulation is that the 
composition of Ld and L _1 results in a stable, shift-invariant 
operator whose impulse response is well localized ]23| 



(L d L-M) (x) =p\(x) eL 1 (R d ). 



(17) 



The function /?l is the generalized B-spline associated with the 
operator L. Ideally, we would like it to be maximally localized. 

To give more insight, let us consider L = D and Ld = Dd 
(the finite-difference operator associated to D). Then, the as- 
sociated B-spline is Pb(x) — Ddl+(:c) = l+(a;) — — 1) 
where is the unit step (Heaviside) function. Hence, 

Pt)(x) — rectfx — s) is a causal rectangle function (poly- 
nomial B-spline of degree 0). 

The practical consequence of ( [17} is 



u = L d s = L d L w — /?l * w. 



(18) 
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Since (p h *w)(x) = (w,^(--x)) where ft(x) = (3 L (-x) 
is the space-reversed version of /3l, it can be inferred from ( fT8| ) 
that the evaluation of the samples of L^s is equivalent to the 
observation of the innovation through a B-spline window. 

From a system-theoretic point of view, Ld is understood as 
a finite impulse response (FIR) filter. This impulse response 
is of the form J^kefi d[k]5(- — k) with some appropriate 
weights d[k]. Therefore, we write the discrete counterpart of 
the continuous-domain innovation variable as 

u[k) = L d s(x)\ x=k = ^ d[k']s(k - k'). 
fc'eo 

This allows us to write in matrix-vector notation the discrete- 
domain version of the innovation model ( fi4] > as 



u = Ls, 



(19) 



where s = {s[k]) ken represents the discretization of the 
stochastic model with s[k] = s(x)\ x=k for k £ f2, L 



is the matrix representation of L^, and u = (u[k]) kG(1 is 
the discrete innovation vector. 

We shall now rely on ( fT6] l to derive the pdf of the discrete 
innovation variable, which is one of the key results of this 
paper. 

Theorem 3. Let s be a stochastic process whose characteristic 
form is given by \\(>) where f is a p-admissible Levy exponent, 
and /3l = L d L~M € L p (R d ) for some p € (0,2]. Then, 
u = L(js is stationary and infinitely divisible. Its first-order 
pdf is given by 



Pu (u)= / exp^vH)^" 



dw 

27? 



(20) 



with Levy exponent 



fa(w) = logpu(u>) = / f{u0l{x)) dx, (21) 
which is p-admissible as well. 



Proof: Taking ( [T8) l into account, we derive the character- 
istic form of u which is given by 

& u (<p) = E^'ri} = E{^* w ^ } } = E{<^ w 'ft*<P)} 

= exp (J f(0£ * ip{xj) dx\ . (22) 

The fact that u is stationary is equivalent to 3P u (ip) = 
& u (lp(- — Xoj) for any Xq £ M. d , which is established by a 
simple change of variable in ( 22 1. We now consider the random 
variable U = (u,S) = (w,f3^). Its characteristic function is 
obtained as 



Pu {uj) =E{e Jt ^} =E{e- 
= exp (f p v (u>)) 



} 



where the substitution ip = ujf3^ in & w (ip) is valid since 
£P W is a continuous functional on L p (R d ) as a consequence 



of the p-admissibility condition. To prove that fpw (uj) is a p- 
admissible Levy exponent, we start by establishing the bound 



p > 



> 



\f(u>0£(x))\ dx 



+ M 



| /ay Ml 



\f(wft(x))<p(x)\ dx 



/sy (w) 



(23) 



which follows from the p-admissibility of /. We are also 
relying on Lebesgue's dominated convergence theorem to 
move the derivative with respect to lu inside the integral 
that defines /^v(w). In particular, (23 1 implies that /^v is 



continuous and vanishes at the origin. The last step is to 
establish its conditional positive definiteness which is achieved 
by interchanging the order of summation. We write 

JV N 

Y] Y] fn? (w m - w„)£ m £„ = 



(24) 



N N 

EE 

m— 1 n—1 



f{u m 0l{x) ~ u n 0£(x))£ m £ n dx > 



under the condition 53m=i — for every possible choice 
of lux,..., uj n 61, £l,...,6v e C, and N e N\ {0}. ■ 
The direct consequence of Theorem [3] is that the primary 
statistical features of u is directly related to the continuous- 
domain innovation process w via the Levy exponent. This 
implies that the sparsity structure (tail behavior of the pdf 
and/or presence of a mass distribution at the origin) is pri- 
marily dependent upon /. The important conceptual aspect, 
which follows from the Levy-Schoenberg theorem, is that the 
class of admissible pdfs is restricted to the family of i.d. laws 
since fpv(u>), as given by pTj ), is a valid Levy exponent. We 
emphasize that this result is attained by taking advantage of 
considerations in the continuous-domain. 

C. Specific Examples 

We now would like to illustrate our formalism by presenting 
some examples. If we choose L = D, then the solution of ( fl4| ) 
with the boundary condition s(0) = is given by 

s(x) = / w{x')dx' 



and is a Levy process. It is noteworthy that the Levy 
processes — a fundamental and well-studied family of stochas- 
tic processes — include Brownian motion and Poisson pro- 
cesses which are commonly used to model random physical 
phenomena |21 . In this case, /3d (x) — rect(x — |) and the 



discrete innovation vector u is obtained by 

u[k] = (w,rect(- + J — k)), 



representing the so-called "stationary independent incre- 
ments". Evaluating ( |2T] i together with /(0) = (see Defi- 
nition fTJ, we obtain 



/^M= / /Md* = /(«). 
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In particular, we generate Levy processes with Laplace- 
distributed increments by choosing /(cj) = log( r2 ^_^ 2 ) 
with the scale parameter r > 0. To see that, we write 
exp(/ | gv(o;)) = pu(u) = T 2 T +U 2 via pTj ). The inverse Fourier 
transform of this rational function is known to be 



Pu{u) 



-t\u\ 



Also, we rely on Theorem [3] in a more general aspect. For 
instance, a special case of interest is the Gaussian (nonsparse) 
scenario where /Gauss(^) = — iM 2 - Therefore, one gets 
f^{uj) = logpu(uj) = -iw 2 ||/3 L ||i from (|2T]l. Plugging 
this into ( |20) , we deduce that the discrete innovation vector 
is zero-mean Gaussian with variance ||/3l||! (i- e -> Pu( u ) = 
A^(0,||/3 L ||i)). 

Additionally, when f(uj) = g with a e [1, 2], one finds 

that fpv(w) = logpu(u) = -^-\\Ph\\t a - This indicates 
that u is a symmetric a-stable (SaS) distribution with scale 
parameter sq = ||/3l|Il ■ F° r a = 1, we have the Cauchy 
distribution (or Student's with r = 1/2). For other i.d. laws, 
the inverse Fourier transformation ( pO) is often harder to 
compute analytically, but it can still be performed numerically 
to determine pu(u) (or its corresponding potential function 
$[/ = — \ogpu ). In general, pjj will be i.d. and will typically 
imply heavy tails. Note that heavy-tailed distributions are 
compressible (24). 



IV. Bayesian Estimation 

We now use the results of Section [III] to derive solutions 
to the reconstruction problem in some well-defined statistical 
sense. To that end, we concentrate on the MAP solutions 
that are presently derived under the decoupling assumption 
that the components of u are independent and identically 
distributed (i .i.d.)- In order to reconstruct the signal, we seek 
an estimate of s that maximizes the posterior distribution ps\y 
which depends upon the prior distribution p$, assumed to be 
proportional to pu (since u = Ls). The direct application of 
Bayes' rule is 



Ps\y(s I y) oc p N (y - Hs)pu(u) 

|y-Hsj| 2 



oc exp 



2a 2 



]Jpu(M k ) 



ken 



Then, we write the MAP estimation for s as 



smap = argmaxps|y(s | y) 



argmin |||Hs 



■ ' ^"'E <l> ' ( ljSk) J ' 

ken I 

(25) 



where $>u{ x ) = —\°gPu{ x ) is called the potential function 
corresponding to pjj- Note that ( p5| is compatible with the 
standard form of the variational reconstruction formulation 
given in Q. In the next section, we focus on the potential 
functions. 



A. Potential Functions 

Recall that, in the current Bayesian formulation, the po- 
tential function $u( x ) = —^°gPu( x ) is specified by the 
Levy exponent fpv which is itself in direct relation with 
the continuous-domain innovation w via pTj ). For illustration 
purposes, we consider three members of the i.d. family: 
Gaussian, Laplace, and Student's (or, equivalently, Cauchy) 
distributions. We provide the potential functions for these 
priors in Table|I] The exact values of the constants C%, C2, and 
C3 and the positive scaling factors Mi, M2, and M3 have been 
omitted since they are irrelevant to the optimization problem. 
On one hand, we already know that the Gaussian prior does 
not correspond to a sparse reconstruction. On the other hand, 
the Student's prior has a slower tail decay and promotes 
sparser solutions than the Laplace prior. Also, to provide a 
geometrical intuition of how the Student's prior increases the 
sparsity of the solution, we plot the potential functions for 
Gaussian, Laplacian, and Student's (with e = 10~ 2 ) estimators 
in Figure [2] By looking at Figure [2] we see that the Student's 
estimator penalizes small values more than the Laplacian or 
Gaussian counterparts do. Conversely, it penalizes the large 
values less. 

Let us point out some connections between the general 
estimator ( |2~5j ) and the standard variational methods. The first 
quadratic potential function (Gaussian estimator) yields the 
classical Tikhonov-type regularizer and produces a stabilized 
linear solution, as explained in Section [I] The second potential 
function (Laplace estimator) provides the £i-type regularizer. 
Moreover, the well-known TV regularizer J7| is obtained if the 
operator L is a first-order derivative operator. Interestingly, the 
third log-based potential (Student's estimator) is linked to the 
limit case of the l v relaxation scheme as p — > 
see the relation, we note that minimizing lim p ^o Y^i 



equivalent to minimizing lim^o 
in (231, it holds that 



As pointed out 



i i i 

<i^> g ( a; ? + «) (26) 

i 

for any n > 0. The key observation is that the upper- 
bounding log-based potential function in |26| is interpretable 
as a Student's prior. This kind of regularization has been 
considered by different authors (see |26]-|28] and also [29] 
where the authors consider a similar log-based potential). 

V. Reconstruction Algorithm 

We have now the necessary elements to derive the general 
MAP solution of our reconstruction problem. By using the 
discrete innovation vector u as an auxiliary variable, we recast 
the MAP estimation as the constrained optimization problem 

s M ap= argmin I -||Hs - y\\j + a 2 V <S> V {u[k}) ) 
seR K \ I ' / 

\ ken / 



subject to u = Ls. 



(27) 
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TABLE I 

Four members of the family of infinitely divisible distributions and the corresponding potential functions. 



Gaussian 
Laplace 

Student's 
Cauchy 



PuW) 



(TO v 2tt 



Property 



B{r,\) \ (x/e) 2 + l 



vrs (x/s )' 2 +l 



Mix 2 + C\ smooth, convex 

+ C*2 nonsmooth, convex 

Malog ( ^—^r 1 + C3 smooth, nonconvex 

x 2 +s 2 

Iog( 2-2-) + C4 smooth, nonconvex 




(a) 



(h) 



Fig. 2. Potential functions (a) and the corresponding proximity operators (b) of different estimators: Gaussian estimator (dash-dotted), Laplacian estimator 
(dashed), and Student's estimator with e = 10 -2 (solid). The multiplication factors are set such that &u(l) = 1 for all potential functions. 



This representation of the solution naturally suggests using 
the type of splitting-based techniques that have been em- 
ployed by various authors for solving similar optimization 
problems p0|-p2). Rather than dealing with a constrained 
optimization problem directly, we prefer to formulate an 
equivalent unconstrained problem. To that purpose, we rely 
on the augmented-Lagrangian method [33 1 and introduce the 
corresponding augmented Lagrangian (AL) functional of ( pTj ) 
given by 

C A (s,xi, a ) = ^\\Us~y\\ 2 2 + a 2 J2^u(u[k}) 

ken 

+a T (Ls - u) + 7^||Ls - u||f, 

where a £ ~R K denotes the Lagrange-multiplier vector and 
fj, £ R is called the penalty parameter. The resulting opti- 
mization problem takes of the form 

min £^(s, u. a). (28) 

To obtain the solution, we apply the alternating-direction 
method of multipliers (ADMM) |34| that replaces the joint 
minimization of the AL functional over (s, u) by the partial 
minimization of Ca( s i u, a.) with respect to each independent 
variable in turn, while keeping the other variable fixed. These 
independent minimizations are followed by an update of the 
Lagrange multiplier. In summary, the algorithm results in the 



following scheme at iteration t: 

u t+1 <- argmin £^(s*,u, a*) (29a) 

u 

s t+1 ^- argmin C A (s, u t+1 , a') (29b) 

S 

a t+1 =a* + ^(Ls* +1 -u* +1 ). (29c) 

From the Lagrangian duality point of view, ( |29c| i can be 
interpreted as the maximization of the dual functional so that, 
as the above scheme proceeds, feasibility is imposed [34|. 

Now, we focus on the sub-problem ( |29a[ ). In effect, we see 
that the minimization is separable, which implies that ( |29a[ ) 
reduces to performing K scalar minimizations of the form 

min (a 2 ® v {u\k]) + - Ou[k] - z[k}) 2 ) , Vfc G fl, (30) 

«[fc]eR V 2 / 

where z — Ls + ocj [l. One sees that pO) is nothing but the 
proximity operator of <&[/(•) that is defined below. 

Definition 4. The proximity operator associated to the func- 
tion \$u{ ) with A £ M. + is defined as 

prox^, (y; A) = argmin -{y - x) 2 + \®u(x). (31) 

Consequently, ( |29a| > is obtained by applying prox $[/ (z; s —) 
in a component-wise fashion to z = Ls* + of / p,. The closed- 
form solutions for the proximity operator are well-known for 
the Gaussian and Laplace priors. They are given by 

prox(.) 2 (z; A) = z(l + 2A)~\ (32a) 
prox|.| (2; A) = max(|z| — A, O)sgn(z), (32b) 
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(a) (b) (c) 



Fig. 3. Images used in deconvolution experiments: (a) stem cells surrounded by goblet cells; (b) nerve cells growing around fibers; (c) artery cells. 

TABLE II 

Deconvolution performance of MAP estimators based on different prior distributions. 





BSNR (dB) 


Gaussian 


Laplace 


Student's 


Stem cells 


20 


14.43 


13.76 


11.86 


Stem cells 


30 


15.92 


15.77 


13.15 


Stem cells 


40 


18.11 


18.11 


13.83 


Nerve cells 


20 


13.86 


15.31 


14.01 


Nerve cells 


30 


15.89 


18.18 


15.81 


Nerve cells 


40 


18.58 


20.57 


16.92 


Artery cells 


20 


14.86 


15.23 


13.48 


Artery cells 


30 


16.59 


17.21 


14.92 


Artery cells 


40 


18.68 


19.61 


15.94 



respectively. The proximity operator has no closed-form so- 
lution for the Student's potential. For this case, we propose 
to precompute and store it in a lookup table (LUT) (cf. 



Figure 2(b) I. This idea suggests a very fast implementation 
of the proximal step which is applicable to the entire class of 
i.d. potentials. 

We now consider the second minimization problem ( |29b| >, 
which amounts to the minimization of a quadratic problem for 
which the solution is given by 



(H T H + /iL T L) 



H J 



AtL T 



(33) 

Interestingly, this part of the reconstruction algorithm is equiv- 
alent to the Gaussian solution given in |4} and (j5j. In general, 
this problem can be solved iteratively using a linear solver such 
as the conjugate-gradient (CG) method. Also in some cases, 
the direct inversion is possible. For instance, when H T H has a 
convolution structure, as in some of our series of experiments, 
the direct solution can be obtained by using the FFT |35|. 

We conclude this section with some remarks regarding the 
optimization algorithm. We first note that the method remains 
applicable when $j/(x) is nonconvex, with the following 
caveat: When the ADMM converges and &u is nonconvex, 
it converges to a local minimum, including the case where 
the sub-minimization problems are solved exactly |34|. As 
the potential functions considered in the present context are 
closed and proper, we stress the fact that if $[/ : E — > K + 
is convex and the unaugmented Lagrangian functional has 
a saddle point, then the constraint in ( |2"7j ) is satisfied and 
the objective functional reaches the optimal value as i 4 



oo (34). Meanwhile, in the case of a nonconvex problems, 
the algorithm can potentially get trapped in a local minimum 
in the very early stages of the optimization. It is therefore 
recommended to apply a deterministic continuation method or 
to consider a warm start that can be obtained by solving the 
problem first with Gaussian or Laplace priors. We have opted 
for the latter solution as an effective remedy for convergence 
issues. 

VI. Numerical Results 

In the sequel, we illustrate our method with some con- 
crete examples. We concentrate on three different imaging 
modalities and consider the problems of deconvolution, MR 
image reconstruction from partial Fourier coefficients, and 
image reconstruction from X-ray tomograms. For each of 
these problems, we present how the discretization paradigm 
is applied. In addition, our aim is to show that the adequacy 
of a given potential function is dependent upon the type of 
image being considered. Thus, for a fixed imaging modality, 
we perform model-based image reconstruction, where we 
highlight images that suit well to a particular estimator. For 
all the experiments, we choose L to be the discrete-gradient 
operator. As a result, we update the proximal operators in 
Section [V] to their vectorial counterparts. The regularization 
parameters are optimized via an oracle to obtain the highest- 
possible SNR. The reconstruction is initialized in a systematic 
fashion: The solution of the Gaussian estimator is used as 
initial solution for the Laplace estimator and the solution of 
the Laplace estimator is used as initial solution for Student's 
estimator. The e parameter for Student's estimator is set to 
1(T 2 . 
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(a) (b) (c) 

Fig. 4. Data used in MR reconstruction experiments: (a) cross section of a wrist; (b) angiography image; (c) k-space sampling pattern along 40 radial lines. 

TABLE III 

MR IMAGE RECONSTRUCTION PERFORMANCE OF MAP ESTIMATORS BASED ON DIFFERENT PRIOR DISTRIBUTIONS. 





Gaussian 


Laplace 


Student's 


Wrist (20 radial lines) 


8.82 


11.8 


5.97 


Wrist (40 radial lines) 


11.30 


14.69 


13.81 


Angiogram (20 radial lines) 


4.30 


9.01 


9.40 


Angiogram (40 radial lines) 


6.31 


14.48 


14.97 



A. Image Deconvolution 

The first problem we consider is the deconvolution of 
microscopy images. In deconvolution, the measurement func- 
tion in (jTTJ corresponds to the shifted version of the point- 
spread function (PSF) of the microscope on the sampling grid: 
ipm(x) — ip D (x — x m ) where ip D represents the PSF. We 
discretize the model by choosing ifi n t(x) = sinc(a;) with 
fkix) — </?int(a; — Xk). The entries of the resulting system 
matrix H are given by 



[H 



m , h 



(C(')-sinc(' - £C fe )>, 



(34) 



In effect, ([34]) corresponds to the samples of the band-limited 
version of the PSF. 

We perform controlled experiments, where the blurring of 
the microscope is simulated by a Gaussian PSF kernel of 
support 9x9 and standard deviation = 4, on three 
microscopic images of size 512 x 512 that are displayed in 
Figure [3] In Figure 3(a) we show stem cells surrounded by 
numerous goblet cells. In Figure 3(b) we illustrate nerve cells 
growing along fibers, and we show in Figure |3(c)| bovine 
pulmonary artery cells. 

For deconvolution, the algorithm is run for a maximum of 
500 iterations, or until the relative error between the successive 
iterates is less than 5 x 10~ 6 . Since H is block-Toeplitz, it can 
be diagonalized by a Fourier transform under suitable bound- 
ary conditions. Therefore, we use a direct FFT-based solver 
for ( (33~| . The results are summarized in Table [II] where we 
compare the performance of three regularizes for the different 
blurred SNR (BSNR) levels defined as BSNR = var(Hs)/cr 2 . 

We conclude from the results of Table [TT] that the MAP 
estimator based on a Laplace prior yields the best performance 
for images having sharp edges with a moderate amount of 
texture, such as those in Figures 3(b)|3(c) This confirms the 
observation that, by promoting solutions with sparse gradient, 



it is possible to improve the deconvolution performance. 
However, enforcing sparsity too heavily, as is the case for 
Student's priors, results in a degradation of the deconvolution 
performance for the biological images considered. Finally, for 



a heavily textured image like the one found in Figure 3(a) 



image deconvolution based on Gaussian priors yields the 
best performance. We note that the derived algorithms are 
compatible with the methods commonly used in the field (e.g., 



Tikhonov regularization |36| and TV regularization |37|). 



B. MRI Reconstruction 

We consider the problem of reconstructing MR images from 
undersampled fe-space trajectories. The measurement function 
represents a complex exponential at some fixed frequencies 
and is defined as ip^(x) — e 27r j( fem,!B ' where k m represents 



the sample point in fe-space. As in Section VI-A we choose 
( / ? int(f) = sinc(a;) for the discretization of the forward model, 
which results in a system matrix with the entries 



rn.k 



(V'm( a; )> sinc (- - x k)) 
e -j27r(fe m!a5fc ) if | fe 



< i 
m |oo _ 2 * 



(35) 



The effect of choosing a sine function is that the system 
matrix reduces to the discrete version of complex Fourier 
exponentials. 

We study the reconstruction of the two MR images of size 
256 x 256 illustrated Figure [4] — a cross-section of a wrist is 
displayed in the first image, followed by an MR angiography 
image — and consider a radial sampling pattern in fe-space (cf. 



Figure 4(c) i. 

The reconstruction algorithm is run with the stopping cri- 
teria set as in Section IVI-AI and an FFT-based solver is used 



for p3\ . We show in Table III the reconstruction performance 
of our estimators for different number of radial lines. 
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(a) (b) 
Fig. 5. Images used in X-ray tomographic reconstruction experiments: (a) the Shepp-Logan (SL) phantom; (b) cross section of the lung. 

TABLE IV 

Reconstruction results of X-ray computed tomography using different estimators. 





Gaussian 


Laplace 


Student's 


SL Phantom (120 direction) 


16.8 


17.53 


18.76 


SL Phantom (180 direction) 


18.13 


18.75 


20.34 


Lung (180 direction) 


22.49 


21.52 


21.45 


Lung (360 direction) 


24.38 


22.47 


22.37 



On one hand, the estimator based on Laplace priors yield 
the best solution in the case of the wrist image, which has 
sharp edges and some amount of texture. Meanwhile, the 
reconstructions using Student's priors are suboptimal because 
they are too sparse. This is similar to what was observed 
with microscopic images. On the other hand, Student's priors 
are quite suitable for reconstructing the angiogram, which is 
mostly composed of piecewise-smooth components. We also 
observe that the performance of Gaussian estimators is not 
competitive for the images considered. Our reconstruction 
algorithms are tightly linked with the deterministic approaches 



used for MRI reconstruction including TV [38] and log-based 
reconstructions 



C. X-Ray Tomographic Reconstruction 

X-ray computed tomography (CT) aims at reconstructing 
an object from its projections taken along different directions. 
The mathematical model of a conventional CT is based on the 
Radon transform 



9e m {t r , 



s(x)6{t m - 



(x,0 m })dx. 



where s(x) is the absorption coefficient distribution of the 
underlying object, t m is the sampling point and 6 m = 
(cos(6* m ), sm(9 m )) is the angular parameter. Therefore, the 
measurement function ip^x) = S(t m — (x,9 m )) denotes an 
idealized line in M 2 perpendicular to 6 m . In our formulation, 
we represent the absorption distribution in the space spanned 
by the tensor product of two B-splines 



s(x) = y%[k]</? ilrt (tE - k) , 



where (p- ln t(x) = tri(a;i)tri(a;2), with tri(x) = (1 — \x\) 
denoting the linear B-spline function. The entries of the 
system matrix are then determined explicitly using the B- 
spline calculus described in [40|, which leads to 



m , k 



{8{t T , 
A? 



A 2 

,1^1 sin 



3! 

_ f(t)-f{t-h) 



}),ip int (x 
M ( _ 



k)) 



k,e, 



where A/j/(£) = jyL> J h {L " ; is the finite-difference operator, 
Ay/(i) is its n-fold iteration, and t + = max(0,i). This 
approach provides an accurate modeling, as demonstrated 
in J40) where further details regarding the system matrix and 
its implementation are provided. 

We consider the two images shown in Figure [5] The Shepp- 
Logan (SL) phantom has size 256 x 256, while the cross 
section of the lung has size 750 x 750. In the simulations 
of the forward model, the Radon transform is computed along 
180 and 360 directions for the lung image and along 120 and 
180 directions for the SL phantom. The measurements are 
degraded with the Gaussian noise such that the signal-to-noise 
ratio is 20 dB. 

For the reconstruction, we solve the quadratic minimization 
problem ( f33"j ) iteratively by using 50 CG iterations. The 



reconstruction results are reported in Table IV 



The SL phantom is a piecewise-smooth image with sparse 
gradient. We observe that the imposition of more sparsity 
brought by Student's priors significantly improves the recon- 
struction quality for this particular image. On the other hand, 
we find that the Gaussian priors for the lung image outperform 
the other priors. Like the deconvolution and MRI problems, 



our algorithms are in line with Tikhonov-type |4T| and TV 1 42 1 
reconstructions used for X-ray CT. 
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D. Discussion 

As our experiments on different types of imaging modalities 
have revealed, sparstity-promoting reconstructions are pow- 
erful methods for solving biomedical image reconstruction 
problems. However, encouraging sparser solutions does not 
always yield the best reconstruction performance and non- 
sparse solutions provided by Gaussian priors still yields better 
reconstructions for certain images. The efficiency of a potential 
function is primarily dependent upon the type of image being 
considered. In our model, this is related to the Levy exponent 
of the underlying continuous-domain innovation process w 
which is in direct relationship with the signal prior. 

VII. Conclusion 

The purpose of this paper has been to develop a practical 
scheme for linear inverse problems by combining a proper 
discretization method and the theory of continuous-domain 
sparse stochastic processes. On the theoretical side, an impor- 
tant implication of our approach is that the potential functions 
cannot be selected arbitrarily as they are necessarily linked 
to infinitely divisible distributions. The latter puts restrictions 
but also constitutes the largest family of distributions that is 
closed under linear combinations of random variables. On the 
practical side, we have shown that the MAP estimators based 
on these prior distributions cover the current state-of-the-art 
methods in the field including £i-type regularizers. The class 
of said estimators is sufficiently large to reconstruct different 
types of images. 

Another interesting observation is that we face an optimiza- 
tion problem for MAP estimation that is generally noncon- 
vex, with the exception of the Gaussian and the Laplacian 
priors. We have proposed a computational solution, based 
on alternating-direction method of multipliers, that applies 
to arbitrary potential functions by suitable adaptation of the 
proximity operator. 

In particular, we have applied our framework to deconvo- 
lution, MRI, and X-ray tomographic reconstruction problems 
and have compared the reconstruction performance of different 
estimators corresponding to models of increasing sparsity. 

In basic terms, our model is composed of two fundamental 
concepts: the whitening operator L, which is in connection 
with the regularization operator, and the Levy exponent /, 
which is related to the prior distribution. A further advantage 
of continuous-domain stochastic modeling is that it enables us 
to investigate the statistical characterization of the signal in any 
transform domain. This observation designates key research 
directions: (1) the identification of the optimal whitening 
operators and (2) the proper fitting of the Levy exponent of 
the continuous -domain innovation process w to the class of 
images of interest. 
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